Device for extracting the excitation function from speech signals



July 21, 1959 c. P. SMITH DEVICE FOR EXTRACTING THE EXCITATION FUNCTIONFROM SPEECH SIGNALS Original Filed June 27, 1952 United States Patent ODEVICE FOR EXTRACTING THE EXCITATION FUNCTION FROM SPEECH SIGNALSCaldwell P. Smith, Bedford, Mass., assiguor to the United States ofAmerica as represented by the Secretary of the Air Force Original No.2,691,137, dated October 5, 1954, Serial No. 296,101, June 27, 1952.Application for reissue April 23, 1956, Serial No. 580,152

11 Claims. (Cl. 179-1) (Granted under Title 35, U.S. Code (1952), sec.266) Matter enclosed in heavy brackets appears in the original patentbut forms no part of this reissue specilication; matter printed initalics indicates the additions made by reissue.

The invention described herein may be manufactured and used by or forthe Government for governmental purposes without the payment to me ofany royalty thereon.

This invention relates to a device for producing electrical signals, oneof which is indicative of the total amount of energy in a plurality `ofchannels, another of which is available for distinguishing between voiceand unvoiced speech sounds and a third of which is available as a pitchsignal.

The device shown -in the single ligure of the accompanying drawing andcontempated hereby comprises an input bus 35 of two conductors on whichis impressed an electrical signal that may be indicative of sound, suchas human speech.

The bus 35 is connected in parallel to a plurality of channels 1 to 32,inclusive, of which the channels 3 to 30 are not shown for simplicity.The components within the channels are substantially duplicates of eachother and hence corresponding channel components are designated bycorresponding numerals primed.

The channels are of narrow frequency bandwidths which divide the speechor other frequencies into contiguous channels. Each filter is of thetuned audio bandpass type shown in the drawing in the dashed blockdesignated 36 as a variable Q audio filter or more simply as a singletuned circuit. The filters are arranged with a frequency separation fromeach other corresponding to the Koenig scale. Each lter passes itsoutput to a balanced full-wave rectifier 37.

Electrical signals suitable for distinguishing between voiced andunvoiced sounds are accomplished by the division of the network into twosections, one embracing low frequency balanced outputs and the otherembracing high frequency balanced outputs from the rectiiers 37, 37 etc.The division between these two sections is at approximately 3,000 cyclesper second for male voices. The frequency division is so connected thatthree pairs of outputs numbered 42, 43 and 44 are obtained therebetweenfrom a plurality of interconnected resistances. This plurality ofinterconnected resistances is connected with two pairs of inputterminals, one from the low frequency lters and one `from the highfrequency filters and has three pairs of output terminals 42, 43 and 44.

The first output 44 from the plurality of interconnected resistances isin accordance with the summation of all the positive and negativevoltages as a measure of the total energy in the speech signal. Thesecond output 42 is in accordance with the difference between thepositive and negative voltages from the low frequency lter network,which is indicative of a pitch signal and uninfluenced by the positiveand negative voltages from the high frequency channels. The third output43 is equal to the algebraic difference of the high frequency and lowfrequency outputs going to a detector not shown, and designated on thedrawing as voiced unvoiced detector. The plurality of interconnectedresistors between the low frequency and high frequency filter groups aremixed at contacts 43 to produce a difference voltage.

The channel frequencies of 1,000 cycles per second and below areincreased by hundreds. Above 1,000 cycles per second the frequenciesincrease in a logarithmic relation. Voiced sounds provide exponentialpulses that are periodic at the frequency of the fundamental pitch.Unvoiced sounds are shown by wave traces having ragged edges. Thedetector to be connected with the terminal 43 is commonly available andhence is not shown.

The heart of the invention resides in the combination of the filters 36,the full-wave rectiers 37 and the matrix or bridge system of theplurality of interconnected resistors 70 to 79, inclusive and thecapacitors 80, 81 and 82, between the members of the contact pairs 42,43 and 44 whereby the low and high frequeny inputs yield the threerespective outputs at the terminals 42, 43 and 44.

The full-wave rectifier 37 gives a balanced D.C. output from an A.C.input. The plurality of interconnected resistances designated in thedrawings 70 to 79, inclusive, is a highly complex device with `anapparent mode of operation.

Illustrative filter frequencies and bandwidths for the channels areshown below:

FILTER FREQUENCIES AND BANDWIDTHS fu Af (cycles/sec.) (cycles/sec.)

In the above chart, channels 1 through 20 cover the low frequency eld upto approximately 2481 cycles per second. Signals above this frequencyare handled by the remaining channels in the device.

Each single tuned circuit 36 comprises tuned circuit tube sections 50and 50 with an inductor 51 shunted by a variable capacitor 52 betweenthe grids of the tube sections. The cathodes of the tube sections 50 and50 are shunted by an adjustable Q control potentiometer resistor 53, towhich a variable tap 54 is adjustably applied. A two section cathodefollower tube has one section 60 connected directly with one of theinput leads `frorn the bus 35 that also is connected through a resistor55 with the grid of the tuned circuit first tube section 50 and througha resistor 58 with a B+ current source. The grid of .the second section60 of the cathode follower tube is connected directly to the other leadfrom the bus 35 that is connected directly to the plate of the tunedcircuit tube section 50 and through resistor 57 with the B+ powersupply. The B+ power supply is applied directly to the plates of thecathode follower tube sections 60 and 60 and through resistors 57 and 58with the plates of the tuned circuit tube sections 50 and 50'respectively. The grid of the second section 50 of the tuned circuittube is connected through a resistor 56 with `the second lead from thebus 35 and with the grid of the cathode follower tube section 60. Outputfrom the single tuned circuit 36 is from the two cathodes of the cathodefollower tube section 60 and 60 to the full-Wave rectifier 37.

In the measurement of the excitation function, the plurality ofinterconnected resistances in the matrix containing the contacts 42, 43and 44 are connected directly to the rectified output from the filtersto provide a measure of the energy envelope as it fiuctuates in time,averaged over one three hundredths of a second by means of theresistance-capacitance smoothing circuit shown. The signal generated bythis summation provides a representation of the excitation emitted from.the vocal cords. The exponential wave rises. steeply and decaysgradually in synchronization with the opening and the closing of thevocal flaps during phonation.

In order to differentiate between voiced and unvoiced sounds, at thecontacts 43 the summation network is divided into two sections, one ofwhich sums the energy -envelopes from all the low frequency filters andthe other surns the high frequency lters. As the formants generally liebelow 3,000 cycles per second the crossover point has been set at thisfrequency. The two summation signals are mixed to produce a detectionsignal proportional to the difference between high and low frequencyenergy. In the resulting voltage wave form the vsibilants produce upwarddeliection having ragged edges and the voiced sounds produce exponentialpulses dellecting downward and periodic at the frequency-of thefundamental pitch.

The low frequency summation signal provides "a means of measuring thefundamental pitch of voiced sounds which is independent of the presenceor absence of lowfrequency components: in the original speech signal.This is in contrast with conventional techniques which use a low-passlter to separate the fundamental pitch.

The pitched signal may be used as excitation for variable resonators inorder to generate synthetic formants which can be added to the originalspeech in synchronismwith the original excitation. This becomes moreeffective if theV pitched signal is first passed through a nonlineardevice in order to replace the high frequency harmonic structureextending up to 3,000 cycles-per- `second which was contained in theoriginal excitation pulses from the vocal cords and progressiveattenuation in the vocal resonators, the analyzing filters and thesmooth circuits.

The sum of the lowand high-frequency summation signals smoothed in aresistance-capacitance filter provides a running average of the totalspeech energy weighted by pre-emphasis of the high frequencies beforeanalysis. This signal is used to control an automaticgain-controlamplifier ahead of the speech analyzer in order to provide an `automaticnormalization of the speech energy level.

What I claim is:

l. A circuit for producing a plurality of electrical signals comprisinga plurality of channels of contiguous frequency bands, and each channelcontaining a single tuned circuit, a balanced full-wave rectierrectifying the output from said tuned circuit and having a positive andnegative balanced output, a pair of smoothing resistors in the balancedoutput of each of said rectifiers and the plurality of -said channelsdivided into two positive and negative outputs from frequency channelsand two outputs from high frequency channels, and a plurality ofinterconnected resistances between the balanced outputs from the pair ofend resistors in the low frequency channels and from the end resistors`in the high frequency channels leading to a pair of total energy signalterminals with a capacitor therebetween and a negative terminalresistively connected to the negative output from the low frequencychannels and resistively connected to the negative output from the highfrequency channels and a positive -terminal resistively connected to thepositive output from the low frequency channels and resistivelyconnected to the positive output from the high frequency channels.

2. A circuit for producing a plurality of electrical signals, comprisinga plurality of channels of contiguous frequency bands, and each channelcontaining a single tuned circuit, a balanced full-wave rectifierrectifying the output from said tuned circuit, and each channelterminating in a pair of resistors receiving their inputs from saidbalanced rectifier, said plurality of channels divided into lowfrequency channels and high frequency channels each having an output ata pair of channel leads of opposite polarity, and a plurality ofinterconnected resistances deriving their input from the pairs ofchannel leads and leading to a pair of pitch signal terminals with acapacitor therebetween and with a negative terminalresistivelyvconnected to the low frequency negative channel lead andwith a positive terminal resistively connected to the low frequencypositive channel lead.

3. A circuit for producing a plurality of electrical signals, comprisinga plurality of channels of contiguous frequency bands, and each channelcontaining a single tuned circuit, a balanced full-wave rectifierrectifying the output from said tuned circuit, and a pair of resistorsat the output end of each channel, said plurality of channel-'s dividedinto low frequency channels increasing by cycles per second betweenchannels up to 1,000 cycles per second and said high frequency channelsincreasing between channels in a logarithmic relation above 1000 cyclesper second, a pair of low frequency channel terminal leads of oppositepolarity, a pair of high frequency channel terminal leads of oppositepolarity, and a plurality of interconnected resistances deriving theirinput from the two pairs of channel leads and providing three pairs ofoutput circuit terminals inclusive of a pair, of voiced unvoiceddetector terminals capacitively coupled together and with each separatedetector terminal resistively connected to the low frequency channellead of one polarity and to the high frequency channel lead of oppositepolarity.

4. A circuit for producing a plurality of electrical signals comprisinga plurality of tuned circuits each having a center frequency differingfrom the center frequency of each of the other circuits and having apredetermined bandwidth, said tuned circuits having a common input towhich a complex electrical signal representative of sounds may beapplied, those circuits having a center frequency at or immediatelyadjacent to the pitch of the sound comprising `a low frequency group andsaid tuned circuits having center frequencies higher than those circuitscomprising the low frequency group making up a higher frequency group,means to produce a balanced D.C. signal proportional to the sum of theoutputs of those tuned circuits comprising the low frequency group,means to produce a balanced D.C. 'signal proportional to the sum of theoutputs of those tuned circuits making up the high frequency group, aplurality of resistance defining a circuit having two pairs of inputsand three pairs of outputs and so interconnected that when the balancedD.C. signal proportional to the sum of the -outputs of the low frequencygroup of tuned circuits is applied to one of said pair of inputterminals `and the balanced D.C. 'signal proportional to the sum f h?9m33.111s: making up the high frequency group is applied to the otherpair of input terminals, the rst pair of output terminals that arecapacitively coupled together and each of which rst output terminals isresistively connected to both the Ilow frequency and high frequency D.C.signal of the same polarity such that the first pair of output terminalswill produce a total energy signal which is the sum of the two inputsignals, the second pair of output terminals that are capacitivelycoupled together and each `of which second output terminals isresistively connected to an output from the low frequency group of tunedcircuits such that the second pair of output terminals will produce asignal proportional only to the input signal representative of theoutput of the low frequency group and the third pair of output terminalsthat are capacitively coupled together and each of which third outputterminals is resistively connected to an output terminal of one polarityfrom the `low frequency group of tuned circuits and to an outputterminal of opposite polarity from the high frequency group of tunedcircuits such that the third pair of output terminals will produce asignal proportional to the difference between the rst input signal andthe second input signal.

5. In speech analysis, an input for speech waves, a multiplicity ofchannels of contiguous frequency bands for subdividz'ng said speechwaves into frequency lsubbands, a rectifying circuit rectifying theoutput from each frequency band channel, a means for adding saidrectified signals from frequency bands in which vowel formants occur,thereby deriving a new signal indicative of voice pitch.

6. In speech analysis, an input for speech waves, a multiplicity ofcontiguous frequency bands for subdividing said speech waves intofrequency subbands, a rectifying circuit rectifying the output from eachfrequency band channel, a means for summing said rectified signals fromlow-frequency bands in which principal vowel energy occurs, a means forsumming said rectified signals from high-frequency bands in whichprincipal consonant energy occurs, a means for subtracting said sumsignals, thereby deriving a signal indicative of presence or absence ofvoicing of the speech signal.

7. In speech analysis, an input for speech waves, a multiplicity ofcontiguous frequency band channels for subdividing said speech wavesinto frequency subbands, a rectifyng circuit rectifyng the output fromeach frequency band, a means for summing said rectified signals, therebyderiving a D.C. signal proportional to total speech signal energy.

8. In speech analysis, an input for speech waves, a multiplicity ofchannels of contiguous frequency bands for subdividing said speech wavesinto frequency subbands, a rectifying circuit rectifying the output fromeach frequency band channel, a means for adding said rectified signalsfrom a group of lower frequency bands in which vowel formants occur,means for adding said rectified signals from a group of higher frequencybands, and means for obtaining the algebraic difference between thedicative of voice pitch.

9. In speech analysis, an input for speech waves, a multiplicity ofcontiguous frequency band channels for subdividing said speech wavesinto higher and lower groups of frequency subbands, a rectifying circuitrectifying the output from each frequency subband, a means for summingsaid rectified signals from a first lower frequency group of subbands asvowel sound energy, a second means for summing said rectified signalsfrom a second higher frequency group of subbands as consonant Soundenergy, and a third means for summing rectified signals from all 0f thesubbands and thereby deriving a direct current signal proportional tototal speech signal energy.

10. In speech analysis circuitry, an input for speech waves, a pluralityof channels subdividing the speech waves into subbands of distinctivefrequency limits, rectifying means rectifying the output from eachsubband, and a plurality of resistor filter means separately presentingfromthe rectified subband outputs a pitch signal and a voiced-unvoicedsignal and a total energy signal as separate distinctive output signalsof the subbands.

`11. A speech analysis circuit, comprising a speech signal input means,a plurality of single tuned circuit initiated channels of progressivelydierentiated frequency bands in the different channels, a single tunedcircuit multivibrator at the input end of each channel comprising a pairof first triade vacuum tubes having first and second input gridelectrodes separately connected to the speech signal input means andconnected together through a capacitor shunted inductor and a pair ofcathode electrodes connected together through a potentiometer resistoradjustably contacted by a tap connected to an end thereof and having apair of first tube plate electrodes separately connected to the speechsignal input means in reverse order from the grid connections with thespeech signal input means, a pair of cathode follower second triodevacuum tubes each having a control grid electrode connected to one plateelectrode of the first tube and each having ya cathode electrodeproviding an output forl the single tuned circuit, a full wave rectifierconnecting in series with the signal tuned circuit in each channel, anda plurality of sound analyzing interconnected resistors summing soundsfrom the channel outputs as distinctive pitch signal, lvoiced-unvoiceddetector signal and as a total energy signal.

References Cited in the le of this patent or the original patent UNITEDSTATES PATENTS 2,098,956 Dudley Nov. 16, 1937 2,512,889 Dreyfus .Tune27, 1950 2,522,539 Riesz Sept. 19, 1950 2,575,909 Davis et al Nov. 20,1951 2,575,910 Mathes Nov. 20, 1951 2,646,465 Davis July 21, 1953

